1Graduate School of Agricultural and Life Sciences, University of Tokyo, 1-1-1, Midori-cho, Nishi-Tokyo, Tokyo, 188-0002, Japan
2Technology Innovation R&D Dept.I, Kubota Corporation, 1-11, Takumi-cho, Sakai-ku, Sakai-shi, Osaka, 590-0908, Japan
| Received 26 May 2025 |
Accepted 24 Sep 2025 |
Published 28 Oct 2025 |
Accurate 3D phenotyping of agricultural produce remains challenging due to the trade-off between reconstruction quality and acquisition throughput in existing sensing technologies. While RGB-D cameras enable high-throughput scanning in operational settings like harvesting conveyors, they produce incomplete, low-quality 3D models. Conversely, close-range Structure-from-Motion (SfM) produces high-quality reconstructions but is not suitable for high-throughput field application. This study bridges this gap through 3DPotatoTwin, a paired dataset containing 339 tuber samples across three cultivars collected in Hokkaido, Japan. Our dataset uniquely combines: (1) conveyor-acquired RGB-D point clouds, (2) ground measurement, (3) SfM reconstructions under indoor controlled environment, and (4) aligned model pairs with transformation matrices. The multi-sensory alignment employs an semi-supervised pin-guided pipeline incorporating single-pin extraction and referencing, cross-strip matching, and binary-color-enhanced ICP, achieving 0.59 ± 0.11 mm registration accuracy. Beyond serving as a benchmark for 3D phenotyping algorithms, the dataset enables training of 3D completion networks to reconstruct high-quality 3D models from partial RGB-D point clouds. Meanwhile, the proposed semi-automated annotation pipeline has the potential to accelerate 3D dataset generation for similar studies. The presented methodology demonstrates broader applicability for multi-sensor data fusion across crop phenotyping applications. The dataset and pipeline source code are publicly available at HuggingFace and GitHub, respectively.